Goto

Collaborating Authors

 stackelberg strategy




A Properties and separations for generalized equilibria A.1 Proof of Proposition 1 Proof. The set of (Φ

Neural Information Processing Systems

If a player receives substantially more or less than the corresponding value, this would imply a violation of the regret constraints for at least one of the players' learning algorithms. Theorem 8. F or each of the following, there exists a The corresponding values for this game are simple to compute: 1. Val If either player violates the other's trust o (T) times, then the player defects to playing L First we elaborate upon the trusting phase. This process repeats for every window. We now show that this algorithm satisfies both conditions in the theorem statement. The last step follows since NumWindows(T) o (T), because Length (T) o (T).



Is Knowledge Power? On the (Im)possibility of Learning from Strategic Interaction

Ananthakrishnan, Nivasini, Haghtalab, Nika, Podimata, Chara, Yang, Kunhe

arXiv.org Artificial Intelligence

When learning in strategic environments, a key question is whether agents can overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty. Can they do this solely through interactions with each other? We focus this question on the ability of agents to attain the value of their Stackelberg optimal strategy and study the impact of information asymmetry. We study repeated interactions in fully strategic environments where players' actions are decided based on learning algorithms that take into account their observed histories and knowledge of the game. We study the pure Nash equilibria (PNE) of a meta-game where players choose these algorithms as their actions. We demonstrate that if one player has perfect knowledge about the game, then any initial informational gap persists. That is, while there is always a PNE in which the informed agent achieves her Stackelberg value, there is a game where no PNE of the meta-game allows the partially informed player to achieve her Stackelberg value. On the other hand, if both players start with some uncertainty about the game, the quality of information alone does not determine which agent can achieve her Stackelberg value. In this case, the concept of information asymmetry becomes nuanced and depends on the game's structure. Overall, our findings suggest that repeated strategic interactions alone cannot facilitate learning effectively enough to earn an uninformed player her Stackelberg value.


Distributed Stackelberg Strategies in State-based Potential Games for Autonomous Decentralized Learning Manufacturing Systems

Yuwono, Steve, Schwung, Dorothea, Schwung, Andreas

arXiv.org Artificial Intelligence

This article describes a novel game structure for autonomously optimizing decentralized manufacturing systems with multi-objective optimization challenges, namely Distributed Stackelberg Strategies in State-Based Potential Games (DS2-SbPG). DS2-SbPG integrates potential games and Stackelberg games, which improves the cooperative trade-off capabilities of potential games and the multi-objective optimization handling by Stackelberg games. Notably, all training procedures remain conducted in a fully distributed manner. DS2-SbPG offers a promising solution to finding optimal trade-offs between objectives by eliminating the complexities of setting up combined objective optimization functions for individual players in self-learning domains, particularly in real-world industrial settings with diverse and numerous objectives between the sub-systems. We further prove that DS2-SbPG constitutes a dynamic potential game that results in corresponding converge guarantees. Experimental validation conducted on a laboratory-scale testbed highlights the efficacy of DS2-SbPG and its two variants, such as DS2-SbPG for single-leader-follower and Stack DS2-SbPG for multi-leader-follower. The results show significant reductions in power consumption and improvements in overall performance, which signals the potential of DS2-SbPG in real-world applications.


The Danger Of Arrogance: Welfare Equilibra As A Solution To Stackelberg Self-Play In Non-Coincidental Games

Levi, Jake, Lu, Chris, Willi, Timon, de Witt, Christian Schroeder, Foerster, Jakob

arXiv.org Artificial Intelligence

The increasing prevalence of multi-agent learning systems in society necessitates understanding how to learn effective and safe policies in general-sum multi-agent environments against a variety of opponents, including self-play. General-sum learning is difficult because of non-stationary opponents and misaligned incentives. Our first main contribution is to show that many recent approaches to general-sum learning can be derived as approximations to Stackelberg strategies, which suggests a framework for developing new multi-agent learning algorithms. We then define non-coincidental games as games in which the Stackelberg strategy profile is not a Nash Equilibrium. This notably includes several canonical matrix games and provides a normative theory for why existing algorithms fail in self-play in such games. We address this problem by introducing Welfare Equilibria (WE) as a generalisation of Stackelberg Strategies, which can recover desirable Nash Equilibria even in non-coincidental games. Finally, we introduce Welfare Function Search (WelFuSe) as a practical approach to finding desirable WE against unknown opponents, which finds more mutually desirable solutions in self-play, while preserving performance against naive learning opponents.


Is Learning in Games Good for the Learners?

Brown, William, Schneider, Jon, Vodrahalli, Kiran

arXiv.org Artificial Intelligence

We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of $\textit{generalized equilibrium}$ which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of $\textit{Stackelberg}$ equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against $\textit{some}$ no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-$\textit{adaptive}$-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.


Equilibria in Epidemic Containment Games

Saha, Sudip (Virginia Tech) | Adiga, Abhijin (Virginia Tech) | Vullikanti, Anil Kumar S. (Virginia Tech)

AAAI Conferences

The spread of epidemics and malware is commonly modeled by diffusion processes on networks. Protective interventions such as vaccinations or installing anti-virus software are used to contain their spread. Typically, each node in the network has to decide its own strategy of securing itself, and its benefit depends on which other nodes are secure, making this a natural game-theoretic setting. There has been a lot of work on network security game models, but most of the focus has been either on simplified epidemic models or homogeneous network structure. We develop a new formulation for an epidemic containment game, which relies on the characterization of the SIS model in terms of the spectral radius of the network. We show in this model that pure Nash equilibria (NE) always exist, and can be found by a best response strategy. We analyze the complexity of finding NE, and derive rigorous bounds on their costs and the Price of Anarchy or PoA (the ratio of the cost of the worst NE to the optimum social cost) in general graphs as well as in random graph models. In particular, for arbitrary power-law graphs with exponent $\beta>2$, we show that the PoA is bounded by $O(T^{2(\beta-1)})$, where $T=\gamma/\alpha$ is the ratio of the recovery rate to the transmission rate in the SIS model. We prove that this bound is tight up to a constant factor for the Chung-Lu random power-law graph model. We study the characteristics of Nash equilibria empirically in different real communication and infrastructure networks, and find that our analytical results can help explain some of the empirical observations.


Computing Optimal Strategies to Commit to in Stochastic Games

Letchford, Joshua (Duke University) | MacDermed, Liam (Georgia Institute of Technology) | Conitzer, Vincent (Duke University) | Parr, Ronald (Duke University) | Isbell, Charles L. (Georgia Institute of Technology)

AAAI Conferences

Significant progress has been made recently in the following two lines of research in the intersection of AI and game theory: (1) the computation of optimal strategies to commit to (Stackelberg strategies), and (2) the computation of correlated equilibria of stochastic games. In this paper, we unite these two lines of research by studying the computation of Stackelberg strategies in stochastic games. We provide theoretical results on the value of being able to commit and the value of being able to correlate, as well as complexity results about computing Stackelberg strategies in stochastic games. We then modify the QPACE algorithm (MacDermed et al. 2011) to compute Stackelberg strategies, and provide experimental results.